Machine Learning on Sets of Documents Connected in Graphs

نویسندگان

  • Janez Brank
  • Jure Leskovec
چکیده

This paper deals with the problem of machine learning on sets of documents connected into graphs. Our strategy is to represent each document by a diverse set of heterogeneous attributes, including traditional binary and categorical attributes, textual attributes, and attributes derived from the graphs. We present experiments on two datasets, showing the usefulness of graph-based attributes and the importance of weighting the different attributes suitably before learning. On the download estimation task, the approach presented here achieved the best results on the KDD Cup 2003 challenge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous Similarity Learning and Feature-Weight Learning for Document Clustering

A key problem in document classification and clustering is learning the similarity between documents. Traditional approaches include estimating similarity between feature vectors of documents where the vectors are computed using TF-IDF in the bag-of-words model. However, these approaches do not work well when either similar documents do not use the same vocabulary or the feature vectors are not...

متن کامل

Classification of Graph Structures

Classification is a classical and fundamental data mining (machine learning) task in which individual items (objects) are divided into groups (classes) based on their features (attributes). Classification problems have been deeply researched as they have a large variety of applications. They appear in different fields of science and industry and may be solved using different algorithms and tech...

متن کامل

Splice Graphs and their Vertex-Degree-Based Invariants

Let G_1 and G_2 be simple connected graphs with disjoint vertex sets V(G_1) and V(G_2), respectively. For given vertices a_1in V(G_1) and a_2in V(G_2), a splice of G_1 and G_2 by vertices a_1 and a_2 is defined by identifying the vertices a_1 and a_2 in the union of G_1 and G_2. In this paper, we present exact formulas for computing some vertex-degree-based graph invariants of splice of graphs.

متن کامل

Different-Distance Sets in a Graph

A set of vertices $S$ in a connected graph $G$ is a different-distance set if, for any vertex $w$ outside $S$, no two vertices in $S$ have the same distance to $w$.The lower and upper different-distance number of a graph are the order of a smallest, respectively largest, maximal different-distance set.We prove that a different-distance set induces either a special type of path or an independent...

متن کامل

طراحی سامانه هوشمند ساخت هستان نگار به کمک شبکه عصبی ARTو روشC-value

In recent years, many efforts have been done to design ontology learning methods and automate ontology construction process. The ontology construction process is a time-consuming and costly procedure for almost all domains/applications, so automating this process is a solution to overcome the knowledge acquisition bottleneck in information systems and reduce the construction cost. In this artic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004